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In this article we propose a novel approach to reduce the computa- 
tional complexity of various approximation methods for pricing discrete 
time American options. Given a sequence of continuation values esti- 
mates corresponding to different levels of spatial approximation and time 
discretization, we propose a multi-level low biased estimate for the price 
of an American option. It turns out that the resulting complexity gain can 
be unexpectedly high and can even reach the order e~ 2 with e denoting 
the desired precision. The performance of the proposed multilevel algo- 
rithm is illustrated by a numerical example of pricing Bermudan max-call 
options. 

1 Introduction 

Pricing an American option usually reduces to solving an optimal stopping 
problem which can be efficiently solved in low dimensions via dynamic pro- 
gramming algorithm. However, many problems arising in practice (see e.g. 
Glasserman (2004)) have high dimensions, and these applications have moti- 
vated the development of Monte Carlo methods for pricing American option. 
Pricing American style derivatives via Monte Carlo is a challenging task be- 
cause it requires a backwards dynamic programming algorithm that seems to 
be incompatible with the forward structure of Monte Carlo methods. In recent 
years much research was focused on the development of fast methods to com- 
pute approximations to the optimal exercise policy. Eminent examples include 
the functional optimization approach of Andersen (2000), the mesh method of 
Broadie and Glasserman (1997), the regression-based approaches of Carriere 
(1996), Longstaff and Schwartz (2001), Tsitsiklis and Van Roy (1999), Egloff 
(2005) and Belomestny (2011). The complexity of the fast approximations al- 
gorithms depends on the desired precision e in a quite nonlinear way which in 
turn is determined by some fine properties of the underlying exercise boundary 
and the continuation values (see, e.g., Belomestny (2011)). In some situations 
(e.g. in the case of the stochastic mesh method or local regression) this com- 
plexity is of order e~ A which is rather high. One way to reduce the complexity 
of the fast approximation methods is to use various variance reduction meth- 
ods. However, the latter methods are often ad hoc and, more importantly, do 
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not lead to provably reduced asymptotic complexity. In this paper we propose 
a generic approach which is able to reduce the order of the asymptotic com- 
plexity and which is applicable to various fast approximation methods, such 
as global regression, local regression or stochastic mesh method. The main 
idea of the method is inspired by the pathbreaking work of Giles (2008) which 
introduced a multilevel idea into stochastics. As similar to the recent work of 
Belomestny et al (2012), we consider not only levels corresponding to different 
discretization steps but also levels related to different degrees of approximation 
of the continuation values. For example, in the case of the Longstaff-Schwartz 
algorithm the latter degree is basically governed by the number of basis func- 
tions and in the case of the mesh method by the number of "training paths" 
used to approximate the continuation values. The new multi-level approach is 
able to significantly reduce the complexity of the fast approximation methods 
leading in some cases to the complexity gain of the order e~ 2 . The paper is 
organised as follows. In Section 2 the pricing problem is formulated, the main 
assumptions are introduced and illustrated. In Section 3 the complexity anal- 
ysis of a generic approximation algorithm is carried out. The main multi-level 
Monte Carlo algorithm is introduced in Section 4 were also its complexity is 
studied. In Section 5 we numerically test our approach for the problem of pric- 
ing Bermudan max-call options via mesh method. The proofs are collected in 
Section 6. 

2 Main setup 

An American option grants the holder the right to select the time at which to 
exercise the option, and in this differs from a European option which may be 
exercised only at a fixed date. A general class of American option pricing prob- 
lems can be formulated through an M. d Markov process {X t , < t < T} defined 
on a filtered probability space (SI, J 2 ", (J ? t )o<t<r» P) • h is assumed that the pro- 
cess (X t ) is adapted to (J ? t )o<t<r i n tne sense that eachX t is & t measurable. 
Recall that each & t is a a -algebra of subsets of £1 such that c J? t c ^ for 
s < t. We restrict attention to options admitting a finite set of exercise oppor- 
tunities = t < ti < t 2 < . . . < tj — T, sometimes called Bermudan options. 
Then 

z r =x tj , j = 0,...,J, 

is a Markov chain. If exercised at time t,-, j — 1, . . . , Jp , the option pays g ; (Z ; ), 
for some known functions g , g 1; . . . , g « mapping R d into [0, oo). Let 3?j denote 
the set of stopping times taking values in {j,j + 1,...,J?}. A standard result 
in the theory of contingent claims states that the equilibrium price VAz) of the 
American option at time t ; in state z given that the option was not exercised 
prior to t ; is its value under an optimal exercise policy: 

V*(z) = sup E[g T (Z T )|Z ; - =*], ze M d . 

1 tsst, 
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A common feature of all fast approximation algorithms is that they can deliver 
estimates Q (z), . . . , C k j_ x {z) for the so called continuation values: 

C*{z):=K[V* +1 {Z j+1 )\Zj=zl j = 0,...,J-l, (2.1) 

based on the set of trajectories (Z^ l \ Zj), i = 1, . . . , fc, all starting from one 

point, i.e., = ... = Z^\ In the case of the so-called regression methods, the 
estimates for the continuation values are obtained via the recursion (dynamic 
programming principle) : 

c)(*) = o, 

C*(z) = E[max(g j+1 (Z j+1 ),C* +1 (Z j+1 ))\Zj=z] 
combined with Monte Carlo: at {J — j) th step one estimates the expectation: 
E[max(g ;+1 (Z J+1 ), C kJ+1 (Z j+1 ))\Zj = z] (2.2) 
by regression (global or local) based on the sample 

(zf,c fcJ+1 (z ; «)), i = l,...,k, 

where C fcj+1 (z) is an estimate for C* +1 (z) obtained in the previous step. An- 
other way to approximate the continuation values C (z), . . . , C^_i(z) is to max- 
imize a Monte Carlo estimate of the expectation E[g Te (Z T )|Z,- = z] based on k 
paths of Z over a vector of parameters 6 = ((9 1; ...,6^)e ®" with 

r e = min{0 < I < j : <$>{Z h Z ) < g ; (Z ; )}, 

where <fi is a predefined function on M d x 9. In this way one gets an estimate 
Ok = (#u. • • • ' e K/) and defines C kJ (z) = <pj{z, 6 kJ ). 

Let us now consider a generic family of the estimates C fc (z), . . . , C k j_ x {z) 
with a natural number k determining the quality of the estimates as well as 
their complexity. In particular we make the following assumptions. 

(AP) For any k e N the estimates C fc (z), . . . , C k j_ 1 (z) are defined on some 
filtered probability space & k , P fc ) which is independent of (Jl, & ', P). 

(AC) For any j = 1, . . . , , and for any fixed z e R d , the estimate C k j(z) has 
numerical complexity of order k x for some x > 0. 

(AQ) There is a sequence of positive real numbers y k with y k — > 0, k — > oo 
such that 




, r?>0 



for some constants B l > and B 2 > 0- 
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Let us now illustrate the above assumptions for three well known approxima- 
tion methods. 

Example 1 (Global regression). Fix a vector of real-valued functions ip = 
. . .,il> L ) on R d . Let a k = (a k v . . . ,a k L ) be a solution of the following 
least squares optimization problem: 

k 2 

arg inf V fe +1>fc (Z« )- a^zf )-...- a L ^ L {zf)] " (2.3) 

with Cj+i,fc( z ) — max {gj+iO 2 )' Q,j+i( z )} • Define an approximation for C* via 
C fcJ (z) = a^iP^z) + ... + a k L ^ L {z), z e R d . 

It is clear that all estimates C k j are well defined on the cartesian product of 
k copies of (£3,^,P). The complexity comp(crp of computing a k is of order 
k ■ L 2 + comp(a^ +1 ), since each a k is of the form a k = B~ 1 b with 



! = 1 

and 



i=l 

p,q e {1, ...,!}. Hence comp(ap ~ (j? — j) • k • L 2 . Furthermore, it can be 
shown that the estimates C fc (z), . . . , C k j_ 1 (z) satisfy the assumption (AQ) un- 
der some regularity conditions (see, e.g., Egloff (2005)), provided L increases 
with k in a logarithmic rate. 

Example 2 (Local regression). Local polynomial regression estimates can be 
defined as follows. Fix some j such that < j < J? and suppose that we want 
to compute the expectation in (2.2): 

K[(j +1 , k (Z j+1 )\Z j =z], zeM d 

with Cj+i^Cz) = max jgj +1 (z), C fcj+1 (z)} . For some 5 > 0, z e R d , an integer 
I > and a function K : R d — > R + , denote by q zk a polynomial on R d of degree 
I (i.e. the maximal order of the multi-index is less than or equal to which 
minimizes 

k ~ 12 (Zf-Z 



K (2.4) 



i=i 



over the set of all polynomials q of degree I. The local polynomial estimator 
of order I for C*(z) is then defined as C fc ; (z) = q Z; k(0) if q z fc is the unique 
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minimizer of (2.4) and C fcj (z) = otherwise. Hence, for any j = 0, . . .,J? — 1, 
the complexity of computing the value Cy(z) is of order fc as k — » oo. The 
value 5 is called the bandwidth and the function K is called the kernel func- 
tion. In Belomestny (2011) it is shown that the local polynomial estimates 
Q,o( z )> • • • > Q JO ?-i( z ) °f degree I satisfy the assumption (AQ) under some reg- 
ularity conditions, provided 5 = fc -1 /( 2 '+ d ). 

Example 3 (Mesh Method). In the mesh method of Broadie and Glasserman 
(2004) the continuation value C* at a point z is approximated via 

1 k 

where Cjy+i(*) = max{g J+1 (z), C fcJ+1 (z)} and 
w i( (z)- 



IJ iy fc D.(z (0 z (0 )' 

where Pj(x, y) is the transition density from Z ; - = x to Z ;+1 = y. Hence, for any 
j = 0, — 1, the complexity of computing C fc j(z) is of order fc, provided 
the transition density p ; (x,y) is analytically known. 

Based on the estimates Q. (z), . . . , C fc ^^(z) one can construct a lower 
bound (low biased estimate) for V Q * using the (generally suboptimal) stopping 
rule: 

z k = min{0 <;</: C fcJ (Z,-) < gj (Zj)} 

with Q ^ = by definition. Fix two natural numbers N and K, and simulate N 
trajectories of the process Z. A low-biased estimate for V * can be then defined 
via 

1 N 

where 



r=l K 



t« = inf{0 <;</: g ; (Z W ) > C Kj (Z^)}. 

Discretization Usually the process X and hence Z can not be simulated ex- 
actly and the so-called discretization schemes have to be used. For the sake 
of concreteness consider a d -dimensional diffusion process whose dynamics is 
given by 

X t = x+ [ b(s,X s )ds+ [ o(s,X s )dW s , (2.5) 
Jo Jo 
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where W is a standard d' -dimensional Brownian motion defined on a filtered 
probability space (£1, & ,{&t)t>o^x) satisfying the usual conditions. The map- 
pings b and cr are Lipschitz continuous in space and locally bounded in time, 
so that (2.5) has a unique strong solution. We approximate the diffusion (2.5) 
on the grid = r <r 1 <...<t^ via Euler scheme with a discretization step 
h > and discretization points ih, i e N. For t > we define <£(t) = ih for 
ih < t < (i + V)h and introduce 

X h , t = x+ f b(cj>(s),X h>(Ks) )ds+ f a((j>(s),X hi<K si)dW s . (2.6) 

JO JO 

Put Z h j = X htj , j = 0,...,Jf. Let now h k , k se N, be a sequence of discretization 
steps tending to 0. Define 

1 N 



'0 



r=l 



with 



inf{0<; </:g ; (Z«p>C^Z«)} 



Although the estimate (2.7) is not any longer low-biased due to a discretization 
error, it still can be viewed as a good approximation for Vq := E [g Tjf (Z TK )J , 
provided h K is small enough. In the next section we analyze the numerical 

— N K 

complexity of the estimate V Q ' . 

AT IT 

3 Complexity analysis of V q ' 

In order to carry out the complexity analysis of the estimate (2.7) we need the 
so-called "margin" or boundary assumption. 

(AM) There exist constants A > 0, 5 > and a > such that 

p{\C*(Z j )-g j (Z j )\<5)<A5 a 

for all j = 0, . . . , J, and all 5 <5 . 

Assumption (AM) provides a useful characterization of the behavior of the con- 
tinuation values {C*} and payoffs {gj} near the exercise boundary dS with 



* = {0',*):gjto>c;(x)}. 



In the situation when all functions C* — gj, j = 0,...,^ — 1, are smooth and 
have non-vanishing derivatives in the vicinity of the exercise boundary, we have 
a = 1 . Other values of a are possible as well (see Belomestny (2011)). While 
the variance of the estimate V Q ' is given by 

Var[V ( f' K ]=Var[g^(Z hK ^)]/iV, 
its bias is analyzed in the following theorem. 
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Theorem 4. Suppose that (AM) and (AQ) hold, and all functions gj are uni- 
formly bounded and Lipschitz continuous, i.e., 

\gj(x)\<G, \gj(x) - gj (y)\ < Se g \\x - y\\, x,yeM d . 

for some constants G > and l£ g > 0. Moreover assume that all continuation 
functions C* are uniformly Lipschitz, i.e., 

\C*{x) - C*(y)| < <£ c \\x -y\\, x,y € R d , 
for j = 0, and k e N. If 



lim sup E 

5^0 



sup \Z } \ P IC^ZO-gjiZOl <5 

!<j<J 



< oo, 



for 1 = 0,..., J? and p > a, then it holds 



V*-Z[Vo' K l 



< 



Tk 



1 i y/ 2 

+ ( r K log Z — Vh^log 2 — ) +h)i 2 , K^oo. 



The next theorem gives an upper estimate for the complexity of V Q ' . 
Theorem 5. Let assumptions of Theorem 4 hold and 

Yk = fc-M k e N 
for some ju > 0. Then under the choice h k = k~P with 

0<a<l, 
1 /xa, 1 < a 

and for any 5 > the complexity of the estimate (2.7) given 

E L v o - V J ^ £ > 
is bounded from above by the value ^ N K (e) with 



_4_^£ 

S 



a> 1, 

2x 

~» a , < a < 1, 



< 



-A- — -5{x+aa) 
-2-^-^-5(x+/x) 



provided e" 2 /^") < it < e - 2 /0^)-s. 



a > 1, 
< a < 1, 



Discussion Theorem 5 implies that the complexity of the estimate V ' can 
be rather high and can even reach the order e~ q for arbitrary large q > 0. In 
the next section we introduce a multilevel approach which is able to reduce the 
asymptotic complexity order. 
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4 Multilevel approach 



Fix some natural number L and let k = (fc 1; . . . , k L ) and n = (n 1; . . . , n L ) be two 
sequences of natural numbers. Define 

with 



t« = inf {0 < j < / : gj -(Z« ) > C fcj (Z«)} , fc ■ 



N. 



The following theorem gives the bias and the variance of the estimate V "' k . 

— n k 

Theorem 6. Let (AQ) and (AM) hold with some a > 0, then the estimate V Q ' 
has the bias of the order 

a/2 



ri^+lrK^-vh^-} +h 



Tk 



1/2 



and the variance of the order 
Var[g(X h . )] ^! 



Y, - { (n^ log 2 — v h kl i log 2 



a/2 



Furthermore, under assumption (AC) the complexity of V Q ' is bounded from 
above by a multiple of 



z=o 



Finally, the complexity of V n ' k is given by the following theorem. 
Theorem 7. Let assumptions of Theorem 4 hold and let 

/or some fi > 0. Then under the choice h k = fc. and fcj = fc K:', Z = 0, 1, . . . , L, 



with 



1 pi, 0<a<lM<a<2V-<l, 



/S = -( pia, l<aV^>l, 
2<aV^<l, 



L = < 



J2_ 

pa 
2 



log K e ^^logK (log^) 



, < a < 1, 
1 < a, 



the complexity of the estimate (2.7), given that 

I 2 



E[V n ' k -V^<* 2 , 
is bounded, up to a constant, from above by ^^e) from Figure 4. 
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Figure 4. 



1 : Complexity for the multilevel approach. 



>n k 0) = { 



(xa log 



l+a -1 



2x < 1; 

ilia 

* -log, f a >\ 



e ^log K e _i , 



>1, 



_ 1 _2_2x min{l+-,4[ 



a > 2, 



1 < a < 2, 



< a < 1. 



Discussion Let us compare the complexities of the estimates Vq' k and V n ' k . 
To this end we compute the ratio function 

^n,kO) 



As can be easily seen, the largest complexity gain with 52(e) e 2 up to a log- 
arithmic factor can be, for example, attained in the situation — ft* 0, a > 1. 
which in turn takes place if a = oo and /i > 0, since for all known approxima- 
tion algorithms x < 1. An example of pricing problems where the assumption 
(AM) holds with an arbitrary large a can be found in Belomestny (2011). 

5 Numerical example: Bermudan max calls on multiple 

ASSETS 

Suppose that the price of the underlying asset X = (X 1 , . . . ,X d ) follows a Geo- 
metric Brownian motion (GBM) under the risk-neutral measure, i.e., 



dX[ = (r - 5)X\dt + oX[dB\, 



(5.1) 



where r is the risk-free interest rate, 5 the dividend rate, o the volatility, and 
B t = (B* , . . . ,Bf) is a vector of d independent standard Brownian motions. At 
any time t e {t ,..., t ^} the holder of the option may exercise it and receive 
the payoff 

h(X t ) = e-^max^ 1 , ...,X t d ) - k) + . 

We consider a benchmark example (see, e.g. Glasserman (2004), p. 462) when 
d = 2, tj = jT/J, j = 0, with T = 3 and / = 9. 

5.1 Mesh method 

Fix some natural numbers k and L, and define a sequence of natural numbers 
via 

fc; = fc xl0', l = 0,...,L. 
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For each I = 1,...,L, simulate independently two fc ; "training" paths of the 
process Z using the exact formula 



Zf = Z«ex P 



r-5- -a 2 



where E,\, i = l,...,k h are i. i. d. standard normal random variables. The 
corresponding transition density is given by 

d 

Pj(x,y) = Y\pjix i ,y i ), x = (x 1 ,...,x d ), y = (y l5 . . .,y d ), 

i=l 



where 

Pj(xi,yi) = 



f 



V 



-(logfe)-^-^-^)^-^^ 



, = exp 

y i a y /2n(tj-t j _ 1 ) 

Using the above paths we construct the sequence of estimates 
Qt,,o( x )> ■■■> Q,,/( x )> I — 1,...,L, 

as described in Example 3. Next fix a sequence of natural numbers n < n 1 < 
... <n L and consider the estimate 

t« = inf {o < ; < / : gj (zj r) ) > C fc(J (zj r) )} , 

where (Z^, . . . , Z^), r = 1, n ; , is a set of n; paths of the process Z. Fur- 
thermore one can use one and the same set of fc ; " training paths" to estimate 
both Q. j and Ct j, Z = 1, . . . , L. This would reduce both the variance and the 

complexity of V"' k . The complexity of the estimate V n ' k is proportional 1 

L 

^ L (n,k) = n k + J] + n^i) 
and its variance is given by 



with 



Ito 



l=i 



"0 f^f "i 
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Table 1 : The estimated level variances for different values of k L together with 
cr* = r L (n*,k). 



L 


2.5 x 10 3 


5 x 10 3 


10 x 10 3 


20 x 10 3 





cr = 12.058 
cr* =0.0539 


cr = 12.084 
cr* = 0.0764 


cr = 11.982 
cr* = 0.1071 


cr = 11.965 
cr* = 0.1513 


1 


cr = 11.841 
a 1 = 5.838 
cr* = 0.0441 


cr = 11.963 
a 1 = 5.342 
cr* = 0.0593 


cr = 12.023 
o l =4.622 
cr* = 0.0773 


cr = 12.040 
o l = 3.998 
a* = 0.1011 


2 


cr = 10.533 
a 1 = 7.015 
cr 2 = 5.882 
cr* = 0.0427 


CT = 10.890 
a x = 6.690 
cr 2 = 5.274 
cr* = 0.0559 


CT = 11.416 
o l = 6.291 
cr 2 = 4.672 
cr* = 0.0727 


cr = 11.739 
<j x = 5.828 
ct 2 = 3.984 
cr* = 0.0921 




7.9799 


8.0245 


8.0464 


8.0678 



with 

CT 0= Var K(O]> ff ?=M**.( Z *«)-**i-i( Z *i-i)]. I = l»-^- 

First we simulate fc; "training" paths, n = 10000 "testing" paths and use 100 
repetitions of "training" and "testing" steps to estimate the level variance a 2 for 
all Z = 0, . . ., L. The estimated level variances are presented in Table 1. Next for 
any L = 0, 1, 2, we fix <€q = 125 x 10 6 and numerically solve the optimisation 
problem: 

n* = arg min % (n, k), J{ = {n e N L : (n, k) < ^ }. 

In this way we find the optimal vector n* leading to the smallest variance of 
the estimate V ' under the budget constraints. The results are presented in 
Table 1 with a* L = y L (n*, k), L = 1,2,3. The values of V ' are also given. In 
Figure 5.1 the corresponding ratios [cr* L ) 2 / ((Tq) 2 , L = 0, 1, 2, are shown. 

5.2 Importance sampling 

One can significantly improve the efficiency of the multilevel approach by ap- 
plying the importance sampling technique. Let us fix some I > and look at 
the distribution of the r. v. 

A l = h Tl (X Tl )-hv l JX^ l _ 1 ). 

As can be seen from the Figure 5.2, A; vanishes for 80%-90% of the "testing" 
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Figure 5.1: Variance reduction effect of the ML approach: the ratio of variances 
K) 2 /(a*) 2 forL = 0,l,2. 



paths in our example and this motivates the application of importance sampling 
technique. First we change the measure from P to Q via 

——(oj) = < 

where 

Now, for a set of testing paths Xy\ r = 1, . . . , n ; , generated under Q the unbi- 
ased Monte-Carlo estimator for E P [A ; ] is given by 

A? = iJ>M) { (x<;> ) - ft,,, } (5.2) 
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Figure 5.2: Histogram of the r.v. A; based on 10000 realisations. An atom in 
is clearly visible. 



Moreover, it holds 



Var Q [A ; ] =E Q [A?]-E2[A Z ] 



(5.3) 



,dQ 
dP 



P(^) 
1 



•Ep[Af] 



dQ 
dP 

1 



E P [A ; ] 



^y(Var P [A ! ]+E^[A ! ])-^ F ^_E P [A ! ] 



P(^) 



Var P [A z ]+E2[A z ] 



<o 



and as a consequence 



Var Q [A z ] 



< 



pW 



Var P [A ; ]. 
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In fact, the last inequality is quite tight, as E 2 [A;] in (5.3) will be negligible. 
As a result we have 

CF 2 

Var Q [A?] <P(^)-— , (5.4) 

meaning that importance sampling reduces the variance by a factor of at least 
P(j?;). Now we calculate again cr* by optimisation of n 1 , . . . , n L where we used 

instead of cr 2 for I = 1, ...,L. The new ratios (o"*) 2 /(c7q) 2 are shown 
in Figure 5.3. As J^; is not known explicitly, sampling from Q is not directly 




Figure 5.3: Variance reduction effect of the ML approach enhanced with im- 
portance sampling: the ratio of variances (cr*) 2 /(cr*) 2 for L = 0, 1,2. 

possible. We apply the following strategy to obtain testing paths Y} r \ r = 
1, . . . , n ; that have approximately the distribution Q. We fix a natural number 
1 < R < n ; and start to generate trajectories Y} 1 Y} 2 \ . . . under P. If a path 

Y.^ leads to different stopping times (i.e. it enters the symmetric difference of 
the exercise regions ^(T^Ad^T;^) at timestep s), we will generate the next 
R paths Y}- q+1 \ . .., Y } q+R ' starting from Y s q at time s. This ensures that those 
paths will also lead to different stopping times. Afterwards we proceed again 
by generating paths Y.^ q+R+1 \ . . . under P and so on. In summary, the algorithm 
is described below. 

1. Set k := 0, k := 0, r := and repeat the following steps while r < n ; : 

(a) Set r := r + 1 and generate Y^ ~ P. 
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(b) Calculate tJ° := t, (Y w ) and z[ r \ := t 1 _ 1 (y (r) ) and set 



5 := min 



(c) If — Tjjj, set fc := fc + 1 and goto (a). Otherwise set fc := k + 1 
and repeat step (d) R times: 

(d) Set r := r + 1 and generate trajectory Y ^ via 

1? := < ~ M (5.5) 

where is a trajectory starting from time s at y( r-1 ) generated 
under P( • | F (r_1) ,...,Y s (r_1) ). Calculate tJ 1 " 3 := t ; and 

2. Define an estimator A" for E P [A ! ] by 

1 1 



3. Estimate P(j? ; ) by 



fc 

P 



k + k 

6 Proofs 

6.1 Proof of Theorem 4 

A family of stopping times (^/)j_ « w.r.t. the filtration (<^))j=o,..„/ i s called 
consistent if 

j<T ; -</, T,=/ 

and 

Lemma 8. Let (Y^_ 0j ^ be a process adapted to the filtration (^))j=o,...,/ an d 
Zet f^j J an< i ( T ; J ^ e two consistent families of stopping times. Then 

f 

= < 



for any j = 0,...,J - 1. 



15 



Proof. We have 



Yj-Y Tl 



+ 



Yj - y T i 



1 {T}>j, T J>j} 



T j+1 J 



Vi T j+i 



1 {Tj=;,T 2 >j} + 



T j+1 T j+1 



1 {T)>j,T*>j}- 



Therefore it holds for A ; = E^j [Y t i 



with A « = and 



A ; = Y?i I 



1=3 



□ 



Introduce 



and 



r=l 
N 



r=l A 



V ' is an estimate computed using "exact" paths in the "training" step and 

N K 

V ' is an estimate based on "exact" paths in both "training" and "testing" 
steps. Then 



v -z[vr 



< 



+ 



E 


[<*' 






E 






yn 



+ 



e[v ]-e[v ^] 



— Ri+R 2 +^3- 
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Estimate for R^: First suppose for conctretness that h K > y K l - We have 



< GE 



so 

1=0 



{^K,l=U^K,l>l} + 1 {t 



where 
and 

Introduce 



t kj = M{1 <j<J: gj{Z hK ,j) > C K j(Zh K ,j)} 
t k>! = inf{Z <j<J: gj (Zj) > C KJ (Zj)}. 

<&i = Igj(Zj) > C KJ (Z } ), gj (z hK>j ) < C KJ (Z hK j)} 
U { gj (Zj) < C Kij (Zj), gj (z hKj ) > c KJ (z hKj )}, 

■*i,o = jo < \gjiZj) - C K j{Z})\ < k v^log i- j , 

j^KV^log^ Ig^O-C^)! <2 l /cv/^log^J 



R 1 < GE 



for j = 0, . . . ,J - 1, and i > 0. It holds 

. Z=0 
" oo /-l 

EE 1 ! 

l_;=o i=o 
Further, denote 

% = {ki(^)-C^(Z ; )|<2 i K V / ^log^|, 

^ ;i = 1 1 C KJ (Z hKj ) - C K A (Z z ) I > 2 ; " 1 k: v 7 ^ log ^ } , 
<^*. = ||c*(Z hjrJ ) - C ( *(Z ; )| > 2 ; " 2 K v^log ^} , 
Sfe,/ = {sup \c K ,ib) ~ Cfa)\ < 2}- 3 Ky/h^log^j e J?*. 
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Then it holds 



<P^|g ! (Z ( )-C i *(Z i )|<2K0^1og — 

+ e p*(jc^(zo-c;ra|>K^i°g^) 



due to (MA). Analagously, using the fact that |g;(Z ; ) - C K i (Z;)| < |c Ki (Z;) - C K> i(Z hK>l )\ 
on <?;, we get 



<E 
<E 



f n51,j} + 1 K,n J5 i' u n^ i _ i } 



1 {% i n© i _ i n5i ji } + 1{^ ;>[ } 

= p(^* i n@* i )+P K (^ u ). 



Since 



P X OY;,;)<Biexp( -2 ! - J B 2 Klog— ) <B!h^ , 



for > 2a, we get 

00 J-i 



i=l !=0 J i=l ;=o v Ky 



Furthermore, 



p n ®u) = p (\ z h K ,i ~Zl\> &c ^K^log i-, |g,(Z z ) - C ; *(Z ; )| < 2 i+1 K^log i-) 



P Z hjt>z -Z z >2>- 1 2 i - 2 K S /h K log — 



1 



where 



p(@* ; ) <2^ +1 W£ /2 log a -i. 



It remains to show that 

00 /-1 



E 2ia E P |^-Z ! |>2 i - 2 i?- 1 0^1og- 



The Markov inequality implies for any p > 



Pi ||Z hK;i -Z i ||>2 i - 2 ^-V^log 



I <^2"f (; - 2) - 



■J/2 
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and it is enough to prove that 



sup {Zk j-ZA 



l.i 



p/2 



(6.1) 



for any p > and some constant C p > 0. Under the condition 



E 


1 rr \P 
sup \Zj\ 






J<j<J 





< oo 



the inequality (6.1) follows from the well known results on the strong conver- 
gence of discretisation schemes, see e.g., Kloeden and Platen (1992), Section 
10.6. 



Estimate for R 2 : It holds 



R 9 = 



sup \\Zj-Z h A 

j=o,-J 



< ^h K , K^oo. 
Estimate for R 3 : Taking into account that 

Cf(Zi) = E* [grjJZ^J <S,(Z I ) 

on {t^ = 1} and 



on {zl > I}, we get from Lemma 8 



Ro = 



< E 



Z=0 



Introduce 



*/ = {gj(Zj) > CJjCZj), g ; (Z ; ) < C KJ (Z ; )} 
U {g ; (Z ; ) < qCZj-), gj(Zj) > C Kij (Zj)}, 



< 



gjtz^-qazj) 

gjiZ^-CjiZj) 



, -1/2 



1/2 
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for j = 0,..., a 
R3 



1 and i > 0. It holds 

f-i 



H Ic'Czo-ftCzoli^} 
. 1=0 

.1=0 1=0 

= r- K 1/2 S lp (l^-W)|<^ 1/2 ) 

|c z *(z I )-gj(z z )|i Wnj/u} 



+E 



i=l !=0 



Using the fact that gj(Z;) — C ; *(Z;)| < |C;(Z ; ) — C ; *(Z ; )| on §1, we derive 

R 3 < rK 1/2 EKl g ' (z < )_c < (Zi) l- r K 1/2 ) 



;=o 



i=l 



T K E 



>"1 

. 1=0 



{|*y(f,)-C7(Z,)|<2'ri 1/a } 



1/2 |P K (Ic^czo-c^z,)^^" 1 ^ 172 ) 



< A/r" (1+a)/2 + A/ r - (1+a)/2 J] 2% expC-B^- 1 ). 

i=l 

6.2 Proof of Theorem 6 

The proof follows the same lines as one one of Theorem 4. 

6.3 Proof of Theorem 7 

In order to simplify the notations, we use I instead of fc ; . Also, we write x < y 
if there exists a constant c > that does not depend on /3, L, N , ...,N L such 
that x < c • y . Moreover, x > y means y < x, and x x y stands for x < y and 
x > y. Let us consider the following optimization problem: 



Skfriih, 1 — » min 



(6.2) 



with constraints 



r^+^iog^v^iog^l^^ + hf 



< 



1 L 



;=i 



n-i lo § 2 — v h i-i lo § 2 7 — 
ri-i fy-i 



1 \ a/2 



+ h l _ 1 ) / n| <e 2 
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We start with some simplifications. First of all, based on the special type of the 
functional (6.2) we will modify the constraints for the bias as 



1 1 7 1 



,h 



12 



and for the variance 



max \ 



. a/2 



i=i 



i=i 



which immediately implies n x e 2 . We will always assume, that k is suf- 
ficiently large, so that y;_ilog 2 y— and fr^log 2 ^— are monotone functions 
with respect to I. Moreover, our analysis is carried out for sufficiently small e. 
Now we can start an optimization procedure. We solve the problem in several 
steps. 

Step 1. < /i. We have /3 < /i => ft; > ji so the constraints can be rewritten 
as 



maxjhf log«l,hf}x 



h a L /2 log a ±<hl /2 ^ e , ifa>l 
ftJ^/i^W^-e, ifo<l 



and 



max \ 



y ~ e 



n;~e 2 Lmax|/i"^log a - — , 



Now we will transform (6.2) via 

L L 



~ ^L^^maxtelog"^-,^!}^ 1 
^ e - 2 I |] max j/if" 1 log" i l}, 



which leads us to the three cases: 
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1. a>2 



e~ 2 L 



g k 1x max j^ 2 " 1 log a i 1 J < ^Lk 1 * x e " 2 " " l g K e -\ 



given j8 = ju and L — - log K e 1 



2. 1 < a < 2 



s " 2 L g k 1 * max jh" 72 " 1 log" pl}< s~ 2 Lk l ^ x+ ^log a K s~ 2 

x e~ 2 ~ 2 ^~? Lk Lx log" e" 2 ^ £ " 4+a - f log^ <■ - 1 
given = jU and L = - log K £ _1 
3. 0<a<l 

In this case, we start from analyzing the constraint for the bias error. It 
holds 

h" /2 io g « ix e ^ apr 2 K Lp ~ e~i. 

h 

Based on the trivial fact, that log^y + (m + l)log K (log K y) is an upper 
bound for the solution x = x(y) of the equation 



— = y, k>1, y>l, meN, 

we have 

log K e~ +21og K (logoff") < L/3 <log^e^" +31og^ (log^e^) . 
The latter inequality gives: 

.V^maxlh^logV,!} £ <r 2 LK^ + %" /2 log" ±- 

1=0 I h ! J 

< g-^Ilfc^log^- 1 

< £ « C« logoff 



e" 2 L 



under the choice /3 = pi and L = ^ log K £ 1 + ^ log K (log^ e a j "| , 
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Step 2. /3 > ju. We have [3 < \i => ft; < 7; so the constraints can be rewritten 
as 



max 



| r «/2 log « _L ft l/2 j = max J K - W2 lQg a ^ K -L^/2 J ^ g 



and 



max \ 



, a/2-i a 1 r 



(=1 



n, 



^maxjr^log"^-,^!. 
Now we will transform (6.2) in the same way, as we did before: 

^kfahj" 1 * e - 2 Lj]^max{ r ^log«— , h^W 1 ~ 
i=o ;=o v ri-i ) 

- e~ 2 L ^ i K lx maxl [ K l ^ a/2 hog a —, l| . 
;=o v Ti J 

It is clear, that jc'^~^ a / 2 Hog a — > 1, if a < 2. Once again, we consider 



several cases 

1. 0<a<l=>/3> ixa. We have 

1/2 < a/2, a 



Tl 



2 3 2 

L < — log K e -1 + - log K (log K e^J 



and 



/=0 



,x+n l + - 



e -i Lk l{ X +p) < g- 1 - 2 ^ i og ;' m e -i ; 



given /3 = pi and L = 

2. 1 < a < 2 and /3 > jua > /i 

Just like in the previous case, we have 

2 . 3 

— log K .£ x + - 



log K e 1 + ^\og K (log,,*? a^j 



2 _, 3 / 

L < — log K . e + - log K log K e - 
ua u y J 
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and hence 



1=0 



2x 1 + 



x e- 1 Lk l ^ x+ ^ <e~ 3 ~^log K v e~ l 

given [3 = pta and L = J log, e' 1 + log, (log, s^J 

3. 1 < a < 2 and pi < < ;ixa => 2/3 - pia > 0. Imposed constraints lead to 
the bias estimate: 



Y a L 12 log" - < h 1 ^ x^L^x log, e" 



1/2 



Yl 



So for the total complexity we have 



;=o 



g 3 ^ log, e 1 , if x — ^ > given [3 = /ia 

— 

" log, e , if x - < given /3 = /i 



-4+a- 



and L 



4. a > 2 and /3 > /ia. 

The answer will be the same, as in the case 2. 

5. a > 2 and pia > [3 > ju. 

In the same way, as in the case 3 we have L/3 ~ log, e~ 2 and for the 
complexity estimate we consider two cases: 



L L i 

^fc^njfr" 1 ~ g~ 2 Ly"V*maxi k 
;=o ;=o ^ 



< 



£~ 2 ^ Ef=o K i( * + ^ a/2) log" i else 
V 2 Lk l *, i£p<f 

g -2 LK L(x+/5-^a/2) log a_L ; ^ 



< < 



_4x 



e ^ log^ +a e" 1 , if ^ < 1, given fl 



o 2x 1+- 



fxa 
~2~ 



log, " e-\ if g > 1, given = //a 



and L = 



log K e 1 



24 



After gathering the results from Step 1 and 2 we obtain the complexity 4. 
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